51 research outputs found

    A short survey on modern virtual environments that utilize AI and synthetic data

    Get PDF
    Within a rather abstract computational framework Artificial Intelligence (AI) may be defined as intelligence exhibited by machines. In computer science, though, the field of AI research defines itself as the study of “intelligent agents.” In this context, interaction with popular virtual environments, as for instance in virtual game playing, has gained a lot of focus recently in the sense that it provides innovative aspects of AI perception that did not occur to researchers until now. Such aspects are typically formed by the computational intelligent behavior captured through interaction with the virtual environment, as well as the study of graphic models and biologically inspired learning techniques, like, for instance, evolutionary computation, neural networks, and reinforcement learning. In this short survey paper, we attempt to provide an overview of the most recent research works on such novel, yet quite interesting, research domains. We feel that this topic forms an attractive candidate for fellow researchers that came into sight over the last years. Thus, we initiate our study by presenting a brief overview of our motivation and continue with some basic information on recent virtual graphic models utilization and the state-of-the-art on virtual environments, which constitutes two clearly identifiable components of the herein attempted summarization. We then continue, by briefly reviewing the interesting video games territory, and by discerning and discriminating its useful types, thus envisioning possible further utilization scenarios for the collected information. A short discussion on the identified trends and a couple of future research directions conclude the paper

    Fusing MPEG-7 visual descriptors for image classification

    Get PDF
    This paper proposes three content-based image classification techniques based on fusing various low-level MPEG-7 visual descriptors. Fusion is necessary as descriptors would be otherwise incompatible and inappropriate to directly include e.g. in a Euclidean distance. Three approaches are described: A “merging” fusion combined with an SVM classifier, a back-propagation fusion combined with a KNN classifier and a Fuzzy-ART neurofuzzy network. In the latter case, fuzzy rules can be extracted in an effort to bridge the “semantic gap” between the low-level descriptors and the high-level semantics of an image. All networks were evaluated using content from the repository of the aceMedia project1 and more specifically in a beach/urban scene classification problem

    Video Summarization Based on Feature Fusion and Data Augmentation

    No full text
    During the last few years, several technological advances have led to an increase in the creation and consumption of audiovisual multimedia content. Users are overexposed to videos via several social media or video sharing websites and mobile phone applications. For efficient browsing, searching, and navigation across several multimedia collections and repositories, e.g., for finding videos that are relevant to a particular topic or interest, this ever-increasing content should be efficiently described by informative yet concise content representations. A common solution to this problem is the construction of a brief summary of a video, which could be presented to the user, instead of the full video, so that she/he could then decide whether to watch or ignore the whole video. Such summaries are ideally more expressive than other alternatives, such as brief textual descriptions or keywords. In this work, the video summarization problem is approached as a supervised classification task, which relies on feature fusion of audio and visual data. Specifically, the goal of this work is to generate dynamic video summaries, i.e., compositions of parts of the original video, which include its most essential video segments, while preserving the original temporal sequence. This work relies on annotated datasets on a per-frame basis, wherein parts of videos are annotated as being “informative” or “noninformative”, with the latter being excluded from the produced summary. The novelties of the proposed approach are, (a) prior to classification, a transfer learning strategy to use deep features from pretrained models is employed. These models have been used as input to the classifiers, making them more intuitive and robust to objectiveness, and (b) the training dataset was augmented by using other publicly available datasets. The proposed approach is evaluated using three datasets of user-generated videos, and it is demonstrated that deep features and data augmentation are able to improve the accuracy of video summaries based on human annotations. Moreover, it is domain independent, could be used on any video, and could be extended to rely on richer feature representations or include other data modalities

    Speaker Verification based on extraction of Deep Features

    No full text
    In this paper we present an approach for speaker verification, based on the the extraction of deep features. More specifically, we propose a scheme that is based on a convolutional neural network. For audio representation we opt for spectrograms, i.e., images that result from the spectral content of voices. Our network is trained to extract visual features from these spectrograms. We demonstrate that our network is able to produce discriminative features for the problem at hand, and moreover, when transfer learning is used, few samples may be needed for accurate speaker verification

    A Visual Context Ontology for Multimedia High-Level Concept Detection

    No full text
    Abstract. The notion of context plays a significant role in multimedia content search and retrieval systems. In this paper we focus our research efforts on a visual context knowledge representation, to be utilized for multimedia high-level concept detection. We propose and describe in detail types of contextual relations evident within the multimedia content, model them and provide a clear methodology on how to extract them. A visual context ontology is introduced, containing relations among different types of content entities, such as images, regions, region types and high-level concepts. In this manner, we facilitate traditional object detection approaches towards semantical interpretation. The application of the proposed knowledge structure provides encouraging initial results, improving the efficacy of related multimedia analysis techniques.

    Semantic multimedia analysis and processing

    No full text
    Broad in scope, Semantic Multimedia Analysis and Processing provides a complete reference of techniques, algorithms, and solutions for the design and the implementation of contemporary multimedia systems. Offering a balanced, global look at the latest advances in semantic indexing, retrieval, analysis, and processing of multimedia, the book features the contributions of renowned researchers from around the world. Its contents are based on four fundamental thematic pillars: 1) information and content retrieval, 2) semantic knowledge exploitation paradigms, 3) multimedia personalization, and 4
    corecore